Compression of word embeddings for natural language processing systems

ABSTRACT

Described herein are systems and methods that provide a natural language processing system (NLPS) that employs compressed word embeddings. An auto-encoder that includes encoder circuitry and decoder circuitry can be used to produce the compressed word embeddings. The decoder circuitry is trained to decompress the word embeddings with reduced or minimal differences between the original uncompressed word embeddings and the corresponding decompressed word embeddings. One or more parameters of the trained decoder circuitry are transferred to the NLPS, where the NLPS is then trained using the compressed word embeddings to improve the correctness of the responses or actions determined by the NLPS.

BACKGROUND

Comprehension of natural language by machines, at a near-human level, isa major goal for Artificial Intelligence. Indeed, most human knowledgeis collected in the natural language of text. Machine comprehension ofunstructured, real-world text has therefore garnered significantattention from scientists, engineers, and scholars. This is due, atleast in part, to the fact many natural language processing tasks, suchas information extraction, relation extraction, text summarization, ormachine translation, depend implicitly or explicitly on a machine'sability to understand and reason with natural language.

Many natural language processing systems (NLPS) employ word embeddingsthat model or represent words and phrases from a vocabulary. The wordembeddings typically map the words and phrases to vectors of realnumbers. When a language input is received, a NLPS obtains thecorresponding word embedding for some or all of the words in thelanguage input. In some instances, the word embeddings are stored in amatrix that can be quite large. For example, a large vocabulary canproduce a large matrix, or the language type (e.g., English) can resultin a large matrix. It can be difficult to store a large matrix in anelectronic device that has a limited amount of memory.

It is with respect to these and other general considerations thatembodiments have been described. Also, although relatively specificproblems have been discussed, it should be understood that theembodiments should not be limited to solving the specific problemsidentified in the background.

SUMMARY

Embodiments disclosed herein provide a natural language processingsystem that employs compressed word embeddings. In one aspect, a systemincludes an auto-encoder processing unit, a first storage device, and asecond storage device. The auto-encoder processing unit includes encodercircuitry and decoder circuitry. The first storage device storescomputer executable instructions that when executed by the auto-encoderprocessing unit performs a method. The method includes compressing, bythe encoder circuitry, one or more uncompressed word embeddings toproduce one or more compressed word embeddings. The one or morecompressed word embeddings are decompressed by the decoder circuitry. Inone embodiment, each of the one or more uncompressed word embeddingsincludes a vector of real numbers, each of the one or more compressedword embeddings comprises a vector of binary numbers, and each of theone or more decompressed word embeddings comprises a vector of realnumbers. The second storage device stores one or more parameters of thedecoder circuitry.

In another aspect, a method includes training at a first time a naturallanguage processing system (NLPS) using uncompressed word embeddings andtraining decoder circuitry in an auto-encoder processing unit withcompressed word embeddings that correspond to the uncompressed wordembeddings. In one embodiment, each uncompressed word embeddingcomprises a vector of real numbers and each compressed word embeddingincludes a vector of binary numbers. The compressed word embeddings areproduced by encoder circuitry in the auto-encoder processing unit. Oneor more parameters in the NLPS are replaced with one or more parametersin the trained decoder circuitry. At a second time, the NLPS is trainedusing the compressed word embeddings.

In yet another aspect, an electronic device includes an input device forreceiving a natural language input, a storage device for storingcompressed word embeddings, and a natural language processing system.The natural language processing system includes natural languageunderstanding (NLU) circuitry that is connected to the storage device,and processing circuitry operably connected to the NLU circuitry. TheNLU circuitry obtains one or more compressed word embeddings thatrepresent at least one word in the natural language input. Theprocessing circuitry receives the compressed word embeddings,decompresses the compressed word embeddings, and processes thedecompressed word embeddings to determine an action to be taken by theelectronic device in response the natural language input. In oneembodiment, each compressed word embedding comprises a vector of binarynumbers and each decompressed word embedding comprises a vector of realnumbers.

This summary is provided to introduce a selection of concepts in asimplified form that are further described below in the DetailedDescription. This summary is not intended to identify key features oressential features of the claimed subject matter, nor is it intended tobe used to limit the scope of the claimed subject matter.

BRIEF DESCRIPTION OF THE DRAWINGS

Non-limiting and non-exhaustive examples are described with reference tothe following Figures. The elements of the drawings are not necessarilyto scale relative to each other. Identical reference numerals have beenused, where possible, to designate identical features that are common tothe figures.

FIG. 1 illustrates an example system that can include a natural languageprocessing system;

FIG. 2 is flowchart depicting a method of operating a natural languageprocessing system that uses compressed word embeddings;

FIG. 3 is a block diagram illustrating an example natural languageprocessing system;

FIG. 4 is a block diagram depicting an example auto-encoder processingunit that may be used to produce compressed word embeddings;

FIG. 5 is a flowchart illustrating an example method of training anauto-encoder processing unit;

FIG. 6 is a process flow diagram depicting an example method of trainingthe decoder circuitry;

FIG. 7 is a flowchart illustrating an example method of training anatural language processing system

FIG. 8 is a process flow diagram depicting an example method of trainingthe natural language processing system;

FIG. 9 is a block diagram illustrating example physical components of anelectronic device with which aspects of the disclosure may be practiced;

FIGS. 10A-10B are simplified block diagrams illustrating a mobilecomputing device with which aspects of the present disclosure may bepracticed; and

FIG. 11 is a block diagram depicting a distributed computing system inwhich aspects of the present disclosure may be practiced.

DETAILED DESCRIPTION

In the following detailed description, references are made to theaccompanying drawings that form a part hereof, and in which are shown byway of illustrations specific embodiments or examples. These aspects maybe combined, other aspects may be utilized, and structural changes maybe made without departing from the present disclosure. Embodiments maybe practiced as methods, systems or devices. Accordingly, embodimentsmay take the form of a hardware implementation, an entirely softwareimplementation, or an implementation combining software and hardwareaspects. The following detailed description is therefore not to be takenin a limiting sense, and the scope of the present disclosure is definedby the appended claims and their equivalents.

Embodiments described herein provide a natural language processingsystem (NLPS) that employs compressed word embeddings. As describedearlier, word embeddings model or represent words and/or phrases from avocabulary. In one aspect, the word embeddings are stored in a largematrix, making it difficult to store the matrix in an electronic devicethat has a limited amount of memory.

The compressed word embeddings can be binary numbers that may be storedcompactly as bits instead of bytes (float number) in electronic devicesthat have limited amounts of storage. Additionally, an electronic devicemay operate a NLPS independent of any other computing devices when thecompressed word embeddings are stored in the electronic device.

Embodiments disclosed herein use an auto-encoder that includes encodercircuitry and decoder circuitry. In aspects, the auto-encoder can beimplemented as a multi-layer neural network, where the encoder circuitryis one layer and the decoder circuitry is another layer. The encodercircuitry is used to produce the compressed word embeddings. The decodercircuitry is trained to decompress the word embeddings with reduced orminimal differences between the original uncompressed word embeddingsand the corresponding decompressed word embeddings. One or moreparameters of the trained decoder circuitry are transferred to the NLPS,where the NLPS is then trained using the compressed word embeddings toimprove the correctness of the responses or actions determined by theNLPS.

FIG. 1 illustrates an example system that can include a natural languageprocessing system. The system 100 generates and controls responses to anatural language inputs (e.g., spoken and textual inputs). The system100 allows a user 105 to submit the language input through aclient-computing device 110. The client-computing device 110 includesone or more input devices 115 that receive the language input. The inputdevice(s) 115 may be any suitable type of input device configured toreceive a language input. In non-limiting examples, the input device(s)120 may be a microphone (using a speech-to-text application (STT) 120)and/or a keyboard.

In some embodiments, the client-computing device 110 is configured toaccess one or more server-computing devices (represented byserver-computing device 125) through one or more networks (representedby network 130) to interact with a natural language processing system(NLPS) 135 stored on one or more storage devices (represented by storagedevice 140) and executed by server-computing device 125. In one or moreembodiments, the network 130 is illustrative of any suitable type ofnetwork, for example, an intranet, and/or a distributed computingnetwork (e.g., the Internet) over which the user 105 may communicatewith other users and with other computing systems.

The NLPS 135 can include a computer-executable program that may bestored in the storage device 140 and executed by the server-computingdevice 125. The NLPS 135 receives and processes the language input anddetermines what action is to be taken in response to the language input.The action may include asking the user 105 for more information or forconfirmation through one or more output devices 145 included in theclient-computing device 110 or connected to the client-computing device110. Example output devices 145 include, but are not limited to, aspeaker (using a text-to-speech (TTS) application 120) and a display.

In one or more embodiments, the client-computing device 110 is apersonal or handheld computing device having both the input and outputdevices 115, 145. For example, the client-computing device 110 may beone of: a mobile telephone; a smart phone; a tablet; a phablet; a smartwatch; a wearable computer; a personal computer; a desktop computer; alaptop computer; a gaming device/computer (e.g., Xbox); a television;and the like. This list of example client-computing devices is forexample purposes only and should not be considered as limiting. Anysuitable client-computing device that provides and/or interacts with aNLPS using word embeddings may be utilized.

In some aspects, the client-computing device 110 can have limitedstorage and/or may operate as a stand-alone device (e.g., limited or noaccess to network 130). The limited access to, or absence of the network130 is represented by the dashed line 150 in FIG. 1. In suchembodiments, the client-computing device 110 can include a NLPS 155 thataccesses compressed word embeddings that are stored in a storage deviceincluded in, or connected to the client-computing device 110. Theprocesses for training the NLPS 155, for compressing the wordembeddings, and for decompressing the word embeddings are described inmore detail in conjunction with FIGS. 3-8.

As should be appreciated, FIG. 1 is described for purposes ofillustrating the present methods and systems and is not intended tolimit the disclosure to a particular sequence of steps or a particularcombination of hardware or software components.

FIG. 2 is flowchart depicting a method of operating a NLPS that usescompressed word embeddings. Initially, as shown in block 200, the NLPSreceives text input that is, or represents a natural language input thatis received by a computing device (e.g., client-computing device 110 viainput device 115 in FIG. 1). For example, in FIG. 1, the user 105 canask the client-computing device 110 to perform an action, and the spokeninput is converted to text using the STT application 120. Examplerequested actions include, but are not limited to, a request to call anamed person, to provide directions to a location, or to find arestaurant.

Based on the text input, the NLPS obtains one or more compressed wordembeddings from a storage device (block 205). As described earlier, theword embeddings can be mathematical representations of words and/orphrases in a vocabulary. In some embodiments, the word embeddings mapthe words and/or phrases to vectors of real numbers. During thecompression process, the original mathematical representation of eachword embedding (e.g., N real/float numbers) is processed and convertedinto a compressed second mathematical representation (e.g., M bits whereM can be larger than N). In one embodiment, the compression process is abinary compression process that transforms the real number of each wordembedding (e.g., a vector of real numbers) into a binary word embeddingof ones and zeros e.g., a vector of binary numbers).

Next, as shown in block 210, the one or more word embeddings aredecompressed. Using the decompressed word embedding(s), the NLPSdetermines an action that a machine (e.g., a client-computing device)should perform in response to the text input (block 215). The action isthen performed by the machine at block 220. For example, if the requestis to call a friend, the action performed by the machine may be to askthe user to confirm the friend's name and number prior to initiating thecall. Alternatively, the action performed by the machine can be to callthe friend.

FIG. 3 is a block diagram illustrating an example NLPS. The NLPS 300 canbe the NLPS 135 and/or the NLPS 155 in FIG. 1. Although the NLPS 300includes four blocks or operations, other embodiments are not limited tothis configuration. A NLPS may include different and/or additionalblocks or operations.

The NLPS 300 includes natural language understanding (NLU) circuitry 305operably connected to a storage device 310 that stores compressed wordembeddings (W_(ec)) 315. In one embodiment, the compressed wordembeddings 315 comprise vectors of binary numbers that are stored in amatrix. The NLU circuitry 305 can be implemented with hardware (e.g.,circuits), with software, or with a combination of hardware andsoftware.

The NLU circuitry 305 is also operably connected to an input ofprocessing circuitry 320. In one embodiment, the processing circuitry320 is a neural network, such as a recurrent neural network. An outputof the processing circuitry 320 is operably connected to naturallanguage generation (NLG) circuitry 325.

As described earlier, a text input 330 is received by the NLU circuitry305. Generally, the NLU circuitry 305 converts the text input 330 into astructured input that the processing circuitry 320 can understand andprocess. The NLU circuitry 305 may analyze the semantic features of thetext input 330 and access the storage device 310 to obtain thecompressed word embeddings 315 for some or all of the words in the textinput 330.

The processing circuitry 320 decompresses the compressed wordembeddings, determines or predicts an action to be taken in response tothe text input 330, and outputs an internal representation of thedetermined action. The NLG circuitry 325 receives the internalrepresentation of the determined action and coverts the internalrepresentation into a natural language output 335. The natural languageoutput 335 may then be presented to a user via a computing device (e.g.,client-computing device 110 using output device 145 in FIG. 1). Forexample, in FIG. 1, the natural language output 335 can be an audiooutput that is presented to the user 105 using a speaker (output device145 via TTS application 120) within or connected to the client-computingdevice 110.

As described earlier, in some embodiments, compressed word embeddingsconsume a reduced amount of storage compared to uncompressed wordembeddings. This allows the compressed word embeddings to be stored in aclient-computing device and used by a NLPS that is also included in theclient-computing device. Compressed word embeddings may be stored in theclient-computing device when the client-computing device has limitedstorage, limited, intermittent, or no network access, and/or when theclient-computing device is to operate as a stand-along computing device.

Ad described earlier, an auto-encoder can be used to compress the wordembeddings. FIG. 4 is a block diagram depicting an auto-encoderprocessing unit that may be used to produce compressed word embeddings.The auto-encoder processing unit 400 includes encoder circuitry 405operably connected to activation function circuitry 410. The encodercircuitry 405 and the activation function circuitry 410 are used toproduce the compressed word embeddings. The auto-encoder processing unit400 further includes decoder circuitry 415 operably connected to theactivation function circuitry 410. In one embodiment, the auto-encoderprocessing unit 400 is implemented as a multi-layer neural network withone layer comprising the encoder circuitry 405, a second layer includingthe activation function circuitry 410, and a third layer comprising thedecoder circuitry 415. One example of the multi-layer neural network isa multi-layer bi-directional recurrent neural network. As will bedescribed in more detail later, one or more parameters of the decodercircuitry 415 can be used in a NLPS.

An input of the encoder circuitry 405 is operably connected to a storagedevice 420 that stores uncompressed word embeddings (W_(e)) 425. In oneembodiment, the uncompressed word embeddings 425 are vectors of realnumbers that are stored in a matrix and each uncompressed word embedding(W_(e)) 425 is received by the encoder circuitry 405. In one embodiment,the encoder circuitry 405 is a linear transformation circuit thattransforms the mathematical representation (e.g., real number or vector)of each word embedding from a linear space to another linear space.

The higher dimensional mathematical representation is then received bythe activation function circuitry 410. In one embodiment, the activationfunction circuitry 410 is a non-linear function or program thattransforms the higher dimensional mathematical representation (e.g.,real number) into a binary representation or number. In one embodiment,the activation function circuitry 410 operates by the followingequation:

$\begin{matrix}{{f(x)} = \{ {\begin{matrix}{0,} & {{{if}\mspace{14mu} x} < 0} \\{1,} & {{{if}\mspace{14mu} x} > 0}\end{matrix},} } & {{Equation}\mspace{14mu} 1}\end{matrix}$

where x is an input value. If x equals 0, then the value of f(x) can bechosen to be zero or one. Any real number other than zero can be used inother embodiments. Additionally or alternatively, the equation for f(x)can include a value for f(x) that is used when x is equal to or lessthan (or greater than) zero. For example, f(x) can be 0 if x is lessthan 0.5 and 1 if greater than or equal to 0.5.

The compressed word embeddings (W_(ec)) 430 are output from theactivation function circuitry 410 and received by the decoder circuitry415. In one embodiment, the compressed word embeddings 430 are vectorsof binary numbers that may be stored in a storage device 435 that isseparate from and not part of the auto-encoder processing unit 400.

The decoder circuitry 415 acts as a decompression circuit thattransforms the compressed word embedding (W_(ec)) 430 back into amathematical representation in the original space (e.g., a real number).In one embodiment, the decoder circuitry 415 is a linear transformationcircuit that maps the binary representation back to the original space.

The decompressed word embeddings (W_(e)*) 440 are vectors of realnumbers that can be stored in a storage device 445. Typically, asindicated by the asterisk, the mathematical representation of adecompressed word embedding (W_(e)*) 440 does not match or equal thecorresponding mathematical representation of the original uncompressedword embedding (W_(e)). As will be described in more detail later inconjunction with FIGS. 5 and 6, a training process is used to train thedecoder circuitry 415 to minimize the differences between themathematical representations of the decompressed word embedding (W_(e)*)440 and the corresponding original uncompressed word embedding (W_(e)).Once trained, one or more parameters of the decoder circuitry 415 can beused in a NLPS.

FIG. 5 is a flowchart illustrating an example method of training anauto-encoder. Initially, as shown in block 500, one or more parametersfor the encoder circuitry is determined. The parameter(s) of the encodercircuitry may be determined using a variety of techniques. In oneembodiment, the parameter(s) are randomly initialized and fixed. Withrandomly initialized parameter(s), the encoder circuitry transforms theword embeddings from a d_(i)-dim manifold (e.g., a linear sub-space)onto a do-dim space which is typically chosen to be larger than d_(i).With a probability of 1, this transformation retains the algebraicrelations of the word embeddings. In one embodiment, when thetransformed word embeddings are processed by the activation functioncircuitry (e.g., activation function circuitry 410 in FIG. 4), theactivation function is a non-linear transformation that produces abinary value for the word embeddings.

The activation function circuitry stretches in the do-dim space. Thisbinary value transformation operation substantially keeps most of theoriginal word relationships. For example, if a word A is closer to wordB than word C in the original space, these relationships may besubstantially maintained after processing by the activation functioncircuitry.

Alternatively, the one or more parameters of the encoder circuitry maybe determined by training the encoder circuitry. For example, theencoder circuitry can be trained by using a first activation functionwith binary output in the forward propagation phase and secondcontinuous activation function that outputs any time ranging from 0 to 1in the backward propagation phase. An example first activation functionfor the forward propagation phase is ƒ_(ƒ)(x)=1 (x≥0). An example secondactivation function for the backward propagation phase is:

$\begin{matrix}{{f_{b}(x)} = \{ {\begin{matrix}{0,} & {{{if}\mspace{14mu} x} \leq {- c}} \\{{{( {x + c} )/2}c},} & {{{if}\mspace{14mu} {x}} < c} \\{1,} & {{{if}\mspace{14mu} x} \geq c}\end{matrix},} } & {{Equation}\mspace{14mu} 2}\end{matrix}$

where x is the input and c is a fixed positive number. The variable ccan have a different value in other embodiments (e.g., any real number).Alternatively, ƒ_(b) can be a sigmoid-like function.

After the one or more parameters of the encoder circuitry is determined,the word embeddings are compressed using the encoder circuitry and theactivation function circuitry (block 505). At block 510, the compressedword embeddings are stored in a storage device (e.g., storage device 435in FIG. 4). The decoder circuitry is then trained using the compressedword embeddings (block 515). During the training process, one or moreparameters of the decoder circuitry may be adjusted to reduce orminimize the differences between the original uncompressed wordembeddings and the decompressed word embeddings. As described earlier,in one embodiment, the decoder circuitry is a layer in a neural network.Thus, one or more network parameters in the neural network can beupdated to reduce or minimize the differences between the originaluncompressed word embeddings and the decompressed word embeddings. Anexample training process is described in more detail in conjunction withFIG. 6.

Next, as shown in block 520, the final decoder parameters are stored ina storage device. In some embodiments, the one or more parameters of thedecoder circuitry 415 can be stored in a storage device, such as thestorage device 435. As will be described in more detail in conjunctionwith FIG. 7, the decoder parameters are used in the NLPS to train theNPLS.

FIG. 6 is a process flow diagram depicting an example method of trainingthe decoder circuitry. This operation may be performed at block 510 inFIG. 5. The training process includes the decoder circuitry 600receiving one or more compressed word embeddings (W_(ec)) 605. Thedecoder circuitry 600 decompresses the compressed word embedding(s) 605to produce decompressed word embeddings (W_(e)**) 610. As discussedearlier, one or more decompressed word embeddings 610 may not equal ormatch a corresponding original uncompressed word embedding W_(e) 615.The asterisk in FIG. 6 represents the difference in the decompressedword embeddings 610.

To eliminate or reduce the differences between the decompressed wordembedding(s) 610 and the corresponding original uncompressed wordembedding(s) 615, the original and the corresponding decompressed wordembeddings 610, 615 are received and compared by a processing unit 620.Any suitable processing unit 620 may be used. For example, in oneembodiment, the processing unit 620 is a comparator circuit. Based onthe comparison, the processing unit 610 provides one or morecompensation or correction values (CV) 625 to the decoder circuitry 600.One or more parameters of the decoder circuitry 600 can be updated basedon the compensation value(s) 625.

The training process repeats a given number of times or until thedifferences between the decompressed and the original uncompressed wordembeddings are at a given amount or level, or do not decrease any more.For example, in one embodiment, the training process repeats until thedifferences between the decompressed and the original uncompressed wordembeddings equal or are less than a threshold value (or do notdecrease).

FIG. 7 is a flowchart illustrating an example method of training anatural language processing system. Initially, as shown in block 700, aNLPS is trained using original uncompressed word embeddings to produce afirst set of actions that have been determined by the NLPS. In oneembodiment, the NLPS is trained using word embeddings from one or moreknown datasets. After the training process, one or more parameters ofthe NLPS are updated or replaced with the parameter(s) of the traineddecoder circuitry (block 705). As described earlier, in one embodiment,the NLPS is implemented as a neural network. Thus, one or more networkparameters in the neural network can be updated with the parameter(s) ofthe decoder circuitry.

Thereafter, at block 710, the NLPS is trained a second time usingcompressed word embeddings that correspond to the original uncompressedword embeddings. In one embodiment, an auto-encoder described inconjunction with FIGS. 4-6 is used to generate the compressed wordembeddings. During the training process, a second set of actions aredetermined by the NLPS based on the compressed word embeddings. One ormore parameters in the NLPS are updated to improve the correctness oraccuracy of the actions determined by the NLPS. In other words, theparameter(s) in the NLPS are adjusted to have the second set of actionsmatch, substantially match, or be closer to the first set of actionsproduced at block 700.

FIG. 8 is a process flow diagram depicting an example method of trainingthe NLPS using the compressed word embeddings. This operation may beperformed at block 710 in FIG. 7. The training process includes the NLPS800 receiving one or more compressed word embeddings (W_(ec)) 805. TheNLPS 800 decompresses the compressed word embedding(s) 805 and processesthe decompressed word embeddings to produce one or more predictedactions 810.

To eliminate or reduce errors in the processing of the decompressed wordembeddings, which improves the accuracy of the predicted action(s), thepredicted action(s) and the corresponding expected action(s) 815 arereceived and compared by a processing unit 820. Any suitable processingunit 820 may be used. Based on the comparison, the processing unit 820provides one or more compensation or correction values (CV) 825 to theNLPS 800. One or more parameters of the NLPS 800 (e.g., one or morenetwork parameters of the neural network) can be updated based on thecompensation value(s) 825 to have the predicted actions 810 match,substantially match, or be closer to the expected actions 815.

The training process repeats a given number of times or until thedifferences between the expected and the predicted actions are at agiven level. For example, in one embodiment, the training processrepeats until the correctness or accuracy of the predicted responsesequals or is greater than a particular confidence value (or do notimprove).

FIGS. 9-11 and the associated descriptions provide a discussion of avariety of operating environments in which aspects of the disclosure maybe practiced. However, the devices and systems illustrated and discussedwith respect to FIGS. 9-11 are for purposes of example and illustrationand are not limiting of a vast number of electronic deviceconfigurations that may be utilized for practicing aspects of thedisclosure, as described herein.

FIG. 9 is a block diagram depicting physical components (e.g., hardware)of an electronic device 900 with which aspects of the disclosure may bepracticed. The components described below may be suitable for thecomputing devices described above, including the client-computing device110 in FIG. 1.

In a basic configuration, the electronic device 900 may include at leastone processing unit 905 and a system memory 910. Depending on theconfiguration and type of the electronic device, the system memory 910may comprise, but is not limited to, volatile storage (e.g., randomaccess memory), non-volatile storage (e.g., read-only memory), flashmemory, or any combination of such memories. The system memory 910 mayinclude a number of program modules and data files, such as an operatingsystem 915, one or more program modules 920 suitable for parsingreceived input, determining subject matter of received input,determining actions associated with the input and so on, a NLPS 925, andcompressed word embeddings 930. While executing on the processing unit905, the NLPS 925 may perform and/or cause to be performed processesincluding, but not limited to, the aspects as described herein.

The operating system 915, for example, may be suitable for controllingthe operation of the electronic device 900. Furthermore, embodiments ofthe disclosure may be practiced in conjunction with a graphics library,other operating systems, or any other application program and is notlimited to any particular application or system. This basicconfiguration is illustrated in FIG. 9 by those components within adashed line 935.

The electronic device 900 may have additional features or functionality.For example, the electronic device 900 may also include additional datastorage devices (removable and/or non-removable) such as, for example,magnetic disks, optical disks, or tape. Such additional storage isillustrated in FIG. 9 by a removable storage device 940 and anon-removable storage device 945.

The electronic device 900 may also have one or more input device(s) 950such as a keyboard, a trackpad, a mouse, a pen, a sound or voice inputdevice, a touch, force and/or swipe input device, etc. The outputdevice(s) 955 such as a display, speakers, a printer, etc. may also beincluded. The aforementioned devices are examples and others may beused. The electronic device 900 may include one or more communicationdevices 960 allowing communications with other electronic devices 965.Examples of suitable communication devices 960 include, but are notlimited to, a radio frequency (RF) transmitter, a receiver, and/ortransceiver circuitry, network circuitry, and universal serial bus(USB), parallel, and/or serial ports.

The term computer-readable media as used herein may include computerstorage media or devices. Computer storage devices may include volatileand nonvolatile, removable and non-removable storage devices implementedin any method or technology for storage of information, such as computerreadable instructions, data structures, or program modules.

The system memory 910, the removable storage device 940, and thenon-removable storage device 945 are all computer storage deviceexamples (e.g., memory storage). Computer storage devices may includeRAM, ROM, electrically erasable read-only memory (EEPROM), flash memoryor other memory technology, CD-ROM, digital versatile disks (DVD) orother optical storage, magnetic cassettes, magnetic tape, magnetic diskstorage or other magnetic storage devices, or any other article ofmanufacture which can be used to store information and which can beaccessed by the electronic device 900. Any such computer storage devicemay be part of the electronic device 900. Computer storage device doesnot include a carrier wave or other propagated or modulated data signal.

Communication media may be embodied by computer readable instructions,data structures, program modules, or other data in a modulated datasignal, such as a carrier wave or other transport mechanism, andincludes any information delivery media. The term “modulated datasignal” may describe a signal that has one or more characteristics setor changed in such a manner as to encode information in the signal. Byway of example, and not limitation, communication media may includewired media such as a wired network or direct-wired connection, andwireless media such as acoustic, radio frequency (RF), infrared, andother wireless media.

Furthermore, embodiments of the disclosure may be practiced in anelectrical circuit comprising discrete electronic elements, packaged orintegrated electronic chips containing logic gates, a circuit utilizinga microprocessor, or on a single chip containing electronic elements ormicroprocessors.

FIGS. 10A and 10B illustrate a mobile electronic device 1000, forexample, a mobile telephone, a smart phone, wearable computer (such as asmart watch), a tablet computer, a laptop computer, a navigation device,a gaming device, and the like, with which embodiments of the disclosuremay be practiced. With reference to FIG. 10A, one aspect of a mobileelectronic device 1000 for implementing the aspects is illustrated.

In a basic configuration, the mobile electronic device 1000 is ahandheld computer having both input elements and output elements. Themobile electronic device 1000 typically includes a display 1005 and oneor more input buttons 1010 that allow the user to enter information intothe mobile electronic device 1000. The display 1005 of the mobileelectronic device 1000 may also function as an input device (e.g., adisplay that accepts touch and/or force input).

If included, an optional side input element 1015 allows further userinput. The side input element 1015 may be a rotary switch, a button, orany other type of manual input element. In alternative aspects, mobileelectronic device 1000 may incorporate more or less input elements. Forexample, the display 1005 may not be a touch screen in some embodiments.In yet another alternative embodiment, the mobile electronic device 1000is a portable phone system, such as a cellular phone. The mobileelectronic device 1000 may also include an optional keypad 1020.Optional keypad 1020 may be a physical keypad or a “soft” keypadgenerated on the touch screen display.

In various embodiments, the output elements include the display 1005 forshowing a graphical user interface (GUI) and a set of availabletemplates, a visual indicator 1025 (e.g., a light emitting diode),and/or an audio transducer 1030 (e.g., a speaker). In some aspects, themobile electronic device 1000 incorporates a vibration transducer forproviding the user with tactile feedback. In yet another aspect, themobile electronic device 1000 incorporates input and/or output ports,such as an audio input (e.g., a microphone jack), an audio output (e.g.,a headphone jack), and a video output (e.g., a HDMI port) for sendingsignals to or receiving signals from an external device.

FIG. 10B is a block diagram illustrating the architecture of one aspectof a mobile electronic device 1000. That is, the mobile electronicdevice 1000 can incorporate a system (e.g., an architecture) 1035 toimplement some aspects. In one embodiment, the system 1035 isimplemented as a “smart phone” capable of running one or moreapplications (e.g., browser, e-mail, calendaring, contact managers,messaging clients, games, media clients/players, content selection andsharing applications and so on). In some aspects, the system 1035 isintegrated as an electronic device, such as an integrated personaldigital assistant (PDA) and wireless phone.

One or more application programs 1040 may be loaded into the memory 1045and run on or in association with the operating system 1050. Examples ofthe application programs include phone dialer programs, navigationprograms, e-mail programs, personal information management (PIM)programs, word processing programs, spreadsheet programs, Internetbrowser programs, messaging programs, and so forth.

The system 1035 also includes a non-volatile storage area 1055 withinthe memory 1045. The non-volatile storage area 1055 may be used to storepersistent information that should not be lost if the system 1035 ispowered down.

The application programs 1040 may use and store information in thenon-volatile storage area 1055, such as an NLPS, compressed wordembeddings, and the like. A synchronization application (not shown) alsoresides on the system 1035 and is programmed to interact with acorresponding synchronization application resident on a host computer tokeep the information stored in the non-volatile storage area 1055synchronized with corresponding information stored at the host computer.

The system 1035 has a power supply 1060, which may be implemented as oneor more batteries. The power supply 1060 may further include an externalpower source, such as an AC adapter or a powered docking cradle thatsupplements or recharges the batteries.

The system 1035 may also include a radio interface layer 1065 thatperforms the function of transmitting and receiving radio frequencycommunications. The radio interface layer 1065 facilitates wirelessconnectivity between the system 1035 and the “outside world,” via acommunications carrier or service provider. Transmissions to and fromthe radio interface layer 1065 are conducted under control of theoperating system 1050. In other words, communications received by theradio interface layer 1065 may be disseminated to the applicationprograms 1040 via the operating system 1050, and vice versa.

The visual indicator 1025 may be used to provide visual notifications,and/or an audio interface 1070 may be used for producing audiblenotifications via an audio transducer (e.g., audio transducer 1030illustrated in FIG. 10A). In the illustrated embodiment, the visualindicator 1025 is a light emitting diode (LED) and the audio transducer1030 may be a speaker. These devices may be directly coupled to thepower supply 1060 so that when activated, they remain on for a durationdictated by the notification mechanism even though the processor 1075and other components might shut down for conserving battery power. TheLED may be programmed to remain on indefinitely until the user takesaction to indicate the powered-on status of the device.

The audio interface 1070 is used to provide audible signals to andreceive audible signals from the user (e.g., voice input such asdescribed above). For example, in addition to being coupled to the audiotransducer 1030, the audio interface 1070 may also be coupled to amicrophone to receive audible input, such as to facilitate a telephoneconversation. In accordance with embodiments of the present disclosure,the microphone may also serve as an audio sensor to facilitate controlof notifications, as will be described below.

The system 1035 may further include a video interface 1080 that enablesan operation of peripheral device 1085 (e.g., on-board camera) to recordstill images, video stream, and the like.

A mobile electronic device 1000 implementing the system 1035 may haveadditional features or functionality. For example, the mobile electronicdevice 1000 may also include additional data storage devices (removableand/or non-removable) such as, magnetic disks, optical disks, or tape.Such additional storage is illustrated in FIG. 10B by the non-volatilestorage area 1055.

Data/information generated or captured by the mobile electronic device1000 and stored via the system 1035 may be stored locally on the mobileelectronic device 1000, as described above, or the data may be stored onany number of storage media that may be accessed by the device via theradio interface layer 1065 or via a wired connection between the mobileelectronic device 1000 and a separate electronic device associated withthe mobile electronic device 1000, for example, a server-computingdevice in a distributed computing network, such as the Internet (e.g.,server-computing device 125 in FIG. 1). As should be appreciated suchdata/information may be accessed via the mobile electronic device 1000via the radio interface layer 1065 or via a distributed computingnetwork. Similarly, such data/information may be readily transferredbetween electronic devices for storage and use according to well-knowndata/information transfer and storage means, including electronic mailand collaborative data/information sharing systems.

As should be appreciated, FIG. 10A and FIG. 10B are described forpurposes of illustrating the present methods and systems and is notintended to limit the disclosure to a particular sequence of steps or aparticular combination of hardware or software components.

FIG. 11 is a block diagram illustrating a distributed system in whichaspects of the disclosure may be practiced. The system 1100 allows auser to interact with a NLSP using, or through a general computingdevice 1105 (e.g., a desktop computer), a tablet computing device 1110,and/or a mobile computing device 1115. The general computing device1105, the tablet computing device 1110, and the mobile computing device1115 can each include the components, or be connected to the components,that are shown associated with the electronic device 900 in FIG. 9.

The general computing device 1105, the tablet computing device 1110, andthe mobile computing device 1115 are each configured to access one ormore networks (represented by network 1120) to interact with one or moreprograms (not shown) stored in one or more storage devices (representedby storage device 1125). The program(s) stored on storage device 1125can be executed on one or more server-computing devices (represented byserver-computing device 1130).

In some aspects, the server-computing device 1130 can access and/orreceive various types of services, communications, documents andinformation transmitted from other sources, such as a web portal 1135, amailbox services 1140, a directory services 1145, instant messagingservices 1150, and/or social networking services 1155. In someinstances, these sources may provide robust reporting, analytics, datacompilation and/or storage service, etc., whereas other services mayprovide search engines or other access to data and information, images,videos, document processing and the like.

As should be appreciated, FIG. 11 is described for purposes ofillustrating the present methods and systems and is not intended tolimit the disclosure to a particular sequence of steps or a particularcombination of hardware or software components.

The description and illustration of one or more aspects provided in thisapplication are not intended to limit or restrict the scope of thedisclosure as claimed in any way. The aspects, examples, and detailsprovided in this application are considered sufficient to conveypossession and enable others to make and use the best mode of claimeddisclosure. The claimed disclosure should not be construed as beinglimited to any aspect, example, or detail provided in this application.Regardless of whether shown and described in combination or separately,the various features (both structural and methodological) are intendedto be selectively included or omitted to produce an embodiment with aparticular set of features. Having been provided with the descriptionand illustration of the present application, one skilled in the art mayenvision variations, modifications, and alternate aspects falling withinthe spirit of the broader aspects of the general inventive conceptembodied in this application that do not depart from the broader scopeof the claimed disclosure.

1. A system, comprising: an auto-encoder processing unit comprising:encoder circuitry; and decoder circuitry operably connected to theencoder circuitry; and a first storage device storing computerexecutable instructions that when executed by the auto-encoderprocessing unit, performs a method comprising: compressing, by theencoder circuitry, one or more uncompressed word embeddings to produceone or more compressed word embeddings for use in a natural languageprocessing system; decompressing, by the decoder circuitry, the one ormore compressed word embeddings to produce one or more decompressed wordembeddings; and a second storage device storing one or more parametersof the decoder circuitry.
 2. The system of claim 1, wherein theauto-encoder further comprises activation function circuitry operablyconnected to the encoder circuitry and the operation of compressing theone or more uncompressed word embeddings comprises compressing, by theencoder circuitry and the activation circuitry, the one or moreuncompressed word embeddings to produce the one or more compressed wordembeddings.
 3. The system of claim 2, wherein the auto-encoder comprisesa multi-layer neural network with the encoder circuitry comprising afirst layer, the activation function circuitry a second layer, and thedecoder circuitry a third layer.
 4. The system of claim 3, wherein theactivation function circuitry comprises a non-linear activationfunction.
 5. The system of claim 4, wherein the encoder circuitrycomprises a first linear transformation circuit.
 6. The system of claim5, wherein the decoder circuitry comprises a second lineartransformation circuit.
 7. The system of claim 3, wherein the encodercircuitry comprises one or more parameters that are randomlyinitialized.
 8. The system of claim 3, wherein the encoder circuitrycomprises one or more parameters that are determined through a trainingprocess.
 9. The system of claim 3, wherein the decoder circuitrycomprises one or more parameters that are determined through a trainingprocess.
 10. A method, comprising: training at a first time a naturallanguage processing system (NLPS) using uncompressed word embeddings;training decoder circuitry in an auto-encoder processing unit withcompressed word embeddings each comprising a vector of binary numbersthat correspond to the uncompressed word embeddings that each comprise avector of real numbers, the compressed word embeddings produced byencoder circuitry in the auto-encoder processing unit; replacing one ormore parameters in the NLPS with one or more parameters in the traineddecoder circuitry; and training at a second time the NLPS using thecompressed word embeddings.
 11. The method of claim 10, furthercomprising compressing, by encoder circuitry in the auto-encoderprocessing unit, uncompressed word embeddings to produce the compressedword embeddings.
 12. The method of claim 11, wherein the auto-encoderprocessing circuitry comprises a multi-layer neural network, wherein theencoder circuitry comprises a first layer in the neural network and thedecoder circuitry comprises a second layer in the neural network. 13.The method of claim 12, wherein a third layer in the neural networkcomprises an activation function layer and the operation of compressingthe uncompressed word embeddings comprises compressing, by encodercircuitry and the activation function layer in the auto-encoderprocessing unit, the uncompressed word embeddings to produce thecompressed word embeddings.
 14. An electronic device, comprising: aninput device for receiving a natural language input; a storage devicestoring compressed word embeddings that each comprise a vector of binarynumbers; and a natural language processing system, comprising: a naturallanguage understanding (NLU) circuitry operably connected to the storagedevice, the NLU circuitry obtaining one or more compressed wordembeddings that represent at least one word in the natural languageinput; and processing circuitry operably connected to the NLU circuitry,wherein the processing circuitry receives the compressed wordembeddings, decompresses the compressed word embeddings, and processesthe decompressed word embeddings to determine an action to be taken bythe electronic device in response the natural language input, whereineach decompressed word embedding comprises a vector of real numbers. 15.The electronic device of claim 14, wherein the input device comprises amicrophone.
 16. The electronic device of claim 14, wherein theprocessing circuitry causes the determined action to be provided to anoutput device.
 17. The electronic device of claim 16, wherein the outputdevice comprises a display.
 18. The electronic device of claim 14,further comprising a natural language generation (NLG) circuitryoperably connected to the processing circuitry, the NLG circuitryconverting the determined action into a natural language output.
 19. Theelectronic device of claim 18, wherein the NLG circuitry causes thenatural language output to be provided to an output device.
 20. Theelectronic device of claim 19, wherein the output device comprises aspeaker.